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We derive an asymptotic theory of nonparametric estimation for 
a time series regression model Z t = f{X t ) + Wt, where {X t } and {Z t } 
are observed nonstationary processes and {Wt} is an unobserved sta- 
tionary process. In econometrics, this can be interpreted as a nonlin- 
ear cointegration type relationship, but we believe that our results 
are of wider interest. The class of nonstationary processes allowed 
for {X t } is a subclass of the class of null recurrent Markov chains. 
This subclass contains random walk, unit root processes and nonlin- 
ear processes. We derive the asymptotics of a nonparametric estimate 
of f(x) under the assumption that {Wt} is a Markov chain satisfy- 
ing some mixing conditions. The finite-sample properties of f(x) are 
studied by means of simulation experiments. 

1. Introduction. Two time series {Xt} and {Zt} are said to be linearly 
cointegrated if they are both nonstationary and of unit root type and if there 
exists a linear combination aXt + bZt = Wt such that {Wt} is stationary. This 
means that the series {Xt,Zt} move together when considered over a long 
period of time. The concept of cointegration was introduced by Granger [10] 
and further developed by Engle and Granger [6]. Since its introduction, there 
have been numerous papers in econometrics exploring its various aspects. 
Some of the main results are given in Johansen [19]. 

The long term relationships between two economic time series may not 
necessarily be linear, however, and the processes {Xt} and {Zt} may not 
be linearly generated unit root processes. This has led to a search for non- 
linear cointegration type relationships such as Z t = f(X t ) + Wt, for some 
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nonlinear function / and some possibly nonlinearly generated input process 
{Xt}. Indeed, functional relationships of this type have been fitted to eco- 
nomic data (see, e.g., [8, 12]), but to our knowledge, the properties of the 
resulting nonparametric estimates have not been established (see [27] for a 
consistency property in a simplified situation, though). A brief discussion of 
the relationship between our work and recent contributions to the theory of 
nonlinear cointegration occurs in Section 6. 

There are at least two difficulties (cf. [11] and others): which class of 
processes should be chosen as a basic class of nonstationary processes and 
how should an estimation theory for an estimate of / be constructed? The 
main goal of this paper is to try to answer these questions, that is, we wish 
to establish a nonparametric estimation theory of the kernel estimator 



where K is a kernel function whose definition and properties are given in 
Section 2.1, h is the bandwidth, {Wt} is an unobserved stationary process 
and {Xt} and {Zt} are observed processes which are nonstationary in a 
sense to be made precise later. At first, {X t } and {Wt} will be assumed 
to be independent processes, which is quite a natural assumption in a non- 
linear regression context. However, in a cointegration framework, this inde- 
pendence assumption is rather restrictive and is generally not fulfilled for 
linear cointegration models. In Section 4, dependence is the main subject. It 
turns out that dependence between {Xt} and {Wt} for fixed t may disappear 
asymptotically. The reason for this phenomenon is related to restrictions on 
the type of dependence which is possible between a stationary and a nonsta- 
tionary process. A stationary process cannot follow a nonstationary process 
too closely as this will violate the stationarity. 

Although the connection between (1.2) and the nonlinear cointegration 
problem is obvious, we would like to point out that the estimation of the 
function / in the general context we are considering should also be of in- 
terest in other areas of application. In a traditional time series regression 
problem, some sort of mixing condition is often assumed for {Xt} in order 
to obtain a central limit theorem for f(x). However, mixing assumptions on 
{Xt} are ruled out in the general situation we consider. A minimal condi- 
tion for undertaking asymptotic analysis on f(x) is that as the number of 
observations on {Xt} increases, there must be infinitely many observations 
in any neighborhood of x. This means that {Xt} must return to a neigh- 
borhood of x infinitely often, which, in turn, implies that the framework 
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of a recurrent Markov chain is especially convenient. Since {Xt} may be 
nonstationary, null recurrent processes have to be included. It should be 
noted that the class of null recurrent processes contains unit root processes 
(cf. [23]). Unlike the parametric situation, where a unit root speeds up the 
convergence of (global) estimates due to the large spread of the observa- 
tions, in the nonparametric case, which is concerned with local estimates, 
the nonstationarity slows down the convergence because the time until the 
process returns to the local neighborhood around x increases, the expected 
time being infinite in the null recurrent case. 

In [21, 22] (hereafter, the Karlsen and Tj0stheim paper [22] is referred 
to as KT), an asymptotic theory was developed for nonparametric estima- 
tion for a nonstationary univariate nonlinear model in the framework of 
so-called /3-null recurrent processes. The latter constitute a subclass of the 
null recurrent processes which contains the random walk. For an alterna- 
tive theoretical approach in the random walk case, we refer to [26]. For a 
relationship between the two approaches, see [2]. 

We will rely on central parts of the theory of KT in our derivations in 
this paper. But, a host of new problems emerges in the regression case, as 
will be made clear in the following. 

2. Notation and some basic conditions. We will follow the notation of 
KT since our proofs and results will be closely based on that paper. Thus, we 
denote by {Xt , t > 0} a (^-irreducible Markov chain on a general state space 
(E, £) with transition probability P. This means that there exists a nontriv- 
ial measure <j) on £ such that each ^-positive set A is communicating with the 
whole state space, that is, J2 n P n (x, A) > for all x £ E whenever <j)(A) > 0, 
A £ 8. In this paper, we take ECK and we denote the class of nonnegative 
measurable functions with ^-positive support by £ + . For a set A G £ , we 
write A E £ + if the indicator function 1a £ £ + . The process {Xt,t > 0} will 
be assumed to be Harris recurrent. This implies that given a neighborhood 
J\f x of x with 4>(M X ) > 0, {Xt} will return to N x with probability one, this 
being what makes asymptotics for a nonparametric estimation possible. The 
chain is positive recurrent if there exists an invariant probability measure 
such that {Xt, t > 0} is strictly stationary and is null recurrent otherwise. 
In this paper, we are primarily interested in the null recurrent situation, in 
which case there exists a (unique up to a constant, nonprobability) invariant 
measure, which will be denoted by tt. 

If T) is a nonnegative measurable function and A is a measure, then the 
kernel 77 <8> A is defined by 



r]®\(x,A) = rj(x)\(A), (x,A)e(E,£). 
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If H is a general kernel, the function Hrj, the measure XH and the number 
XHr] are defined, respectively, by 

H V (x) = J H(x,dy) V (y), XH(A) = J X(dx)H(x,A), 



XH V = J XH(dy)rj(y). 

The convolution of two kernels, H\ and H 2 , gives another kernel, defined by 

H l H 2 (x,A) = J H 1 (x,dy)H 2 (y,A). 

Due to associative laws, the number XHiH 2 r] is uniquely defined. If A E £ 
and 1a is the corresponding indicator variable, then H1a(x) = H{x,A). The 
kernel I g is defined by I g (x,A) = g(x)lA(x) and the special case g = 1q is 
denoted Ic- 

We define rj G £ + to be small if there exist a measure A, a positive constant 
b and an integer m > 1 such that 

(2.1) P rn >br 1 ®X. 

A set ^4 is said to be small if 1a is small. Under quite broad conditions (cf. 
[9]), a compact set will be small. In this case, it follows from (2.1) that a 
(^-positive subset of a compact set will be small. If A satisfies (2.1) for some 
rj, b and m, then A is a small measure. 

A fundamental fact for ^-irreducible Markov chains is the existence of a 
minorization inequality ([24], Theorem 2.1 and Proposition 2.6, pages 16-19): 
there exist a small function s, a probability measure v and an integer mo > 1 
such that 

P m ° >s®v. 

Some technical difficulties arise if itiq > 1 because this necessitates the mo- 
step chain; it is not a severe restriction to assume that mo = 1. Therefore, 
unless otherwise stated, in the sequel we will assume that the minorization 
inequality 

(2.2) P>s®v 

holds, where s and v are small and v{E) = 1. In particular, this implies that 
< s(x) < 1, x G E. If (2.2) holds, then the pair (s,v) is called an atom 
(for P). A wide class of nonlinear AR(1) processes satisfying (2.2) is given 
in KT. 
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From (2.2), we obtain the identity 

P( X ,A) = (l-.(x)){( ^-f_-^ )l(.(.)<l) 

(2.3) +l A (x)l(s(x) = i)\ + s(x)v(A) 

<=1 (l- s (x))Q(x,A) + s(x)u(A), 

so that the transition probability P can be thought of as a mixture of the 
transition probability Q and the small measure v. Since v is independent 
of x, this means that the chain regenerates each time v is chosen. This occurs 
with probability s(x). The reasoning can be formalized by introducing the 
split chain {(X t ,Y t )}, where the auxiliary chain {Y t } can only take values 
and 1. Given that Xt = x and Yt_i = yt— l, Yt takes the value 1 with 
probability s(x) so that a = E x {1} is a proper atom (cf. [24], page 51) for 
the split chain. We denote by 

5 a = min{t> l:Y t = l} 

the corresponding recurrence time. We will also make use of the consecutive 
sequence of recurrence times starting at time t = 0, 

(2.8) Tfc = min{t > r fc _i : Y t = 1}, r_i = f -1 for & > 0, r = r Q = to, 
and the number of regenerations in the time interval [0,n], that is, 

T(n) = max{A; : < n} V 0. 

An invariant measure tt s can be defined in terms of the atom (s,v) of 
(2.2). In fact (KT, Section 3.2), 

oo 

(2.9) vr/^i/G^, G^YsiP-^vY- 

1=0 

If the measure tt s is absolutely continuous with respect to Lebesgue mea- 
sure, we denote by p s the corresponding density so that p s {x) dx = tt s (dx). 
Similarly, for C G £ + , we define the density pc(x) = p s (x)/ir s lc- For a ir s - 
integrable function g on R, we use the notation Tr s g for 

ir s g = ir s (g) = J g(x)ir s (dx). 

Corresponding to T(n), for a set C G £ + , the number of times {Xj} visits 
C up to time n is denoted by 

/? 

T c {n)=^c{X t ). 
t=o 
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Prom KT (Remark 3.5) we have that Tc(n)/T(n) — -+ tt s 1c- 

The kernel G s ^ of (2.9) plays an important role in Section 3 and it easily 
follows from the above that for a 7r s -integrable g defined on E, with being 
the expectation conditional on X(0) = x, 

T 

(2.10) E x J2g(Xt) = G Sj „g(x). 

t=o 

The minorization condition and the accompanying split chain permit the 
decomposition of the chain into separate and identical parts defined by the 
regeneration points. We have, for a function g, 

n T(n) 

(2.11) S n (g) =' J2 9{X t ) = U +^U k + U {n) , 

t=0 k=l 



where 



U, 



9(Xt), whenfc>0, 

t=Tfe_l+l 
n 

Y g( x t), when k=(n). 



The sequence {(£/&, (r^ — Tk-i)), k > 1} consists of independent identically 
distributed (i.i.d.) random variables. This partition of the chain is of basic 
importance for the subsequent asymptotic analysis. In the following, we will 
sometimes use the symbol U = U(g) to denote a random variable represent- 
ing the common marginal distribution of {Uk, k > 1}. 

We must introduce a restriction on the way the process regenerates: the 
chain {X{\ is /3-null recurrent if there exist a small nonnegative function /, 
an initial measure A, a constant /? G (0, 1) and a slowly varying function Lf 
such that 



n -, 

(2.12) E A £/(X t )~— — -n%( 



t=o 



r(i+/3) 



n) 



as n — > oo. This condition is equivalent to (cf. KT, Theorem 3.1) a restriction 
on the tail distribution of the recurrence time S a , in that 

(2.13) F a (S a >n)= 1 (1 + 0(1)), 

r(l - (i)nPL s [n) 

where L s is a slowly varying function depending on s and where ¥ a means 
that the initial distribution is equal to S a (x,y), that is, Yq = 1, Xq = x 
arbitrary. In the sequel, (2.13) will be referred to as the tail condition. 
A random walk process is /3-null recurrent with (3 = 1/2. 
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2.1. Basic conditions. We denote by h = h n the bandwidth used in the 
nonparametric estimation. It is assumed to satisfy h n — > and, with no loss 
of generality, we also assume that h n < 1. Let K : R — > M. be a kernel function 
and for a fixed x, let K Xjh (y) = hr l K((y - x)/h), M x (h) = {y : K Xjh (y) / 0} 
and J\f x = M X (1). In our context, a locally bounded function will be taken to 
mean a function bounded in a neighborhood of x and a locally continuous 
function is a function continuous at the point x. Without loss of generality, 
we may assume that this neighborhood equals M x and that local continuity 
implies local boundedness. This follows since M x (h) = x © hNo- 

We will consider the problem of evaluating the properties of the kernel 
estimator (1.1) of the function / of (1.2) under the assumption that {Wt} 
is Markov. In Section 3, {X t } and {Wt} are assumed to be independent. 
The independence assumption is removed in Section 4, and the compound 
process {(Xt, Wt)} is assumed to be Markov. 

The following set of conditions is always assumed: 

Bo (i) the kernel K is nonnegative, J K(u) du < oo and 
\\K\\l= J K 2 (u) du <oo; 

(ii) the {Xt} process is a Harris recurrent Markov chain; 

(iii) the transfer function / is continuous at the point x. 

We will also make heavy use of the following conditions B1-B4 of KT. For 
ease of reference, these conditions are restated here: 

B 1 (i) jK(u)du = l; 

(ii) JuK(u)du = 0; 
B2 (i) the support A/q of the kernel is contained in a compact set; 

(ii) the kernel is bounded and N x is a small set; 
B3 the invariant measure ir s has a locally continuous density p s which is 

locally strictly positive, that is, p s (x) > 0; 
B4 for all {Ah} £ £ such that Ah j 0,lim^o ^ m y^x P(y, Ah) = 0. 

In all of the proofs, we use c±,C2, . . . as a sequence of generic constants 
and if {a n } and {b n } are two real-valued strictly positive sequences, then we 
write a n <C b n if a n = 0(b n ). The associated c-algebra, , for a stochastic 
process {Xt, t > 0} is defined in the usual way: = a{Xj,j < t} and 

** = V.^- 

3. Nonparametric estimation of /. At the outset, we assume that {Wt} 
is a (^-irreducible ergodic Markov chain which satisfies (2.2). Additional as- 
sumptions will be introduced as needed. Actually, we also allow a slight 
generalization of (1.2), in that we include an instantaneous transformation 
of Wt, resulting in 



(3.1) 



Zt = f(X t )+gw(Wt). 
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This is an extension of (1.2) since even if {Wt} is Markov, {W[} = {<?vy(VFf)} 
does not have to be Markov. We assume that Egjy(Wo) = 0. Because we do 
not generally restrict gw to be a small function (consider, e.g., gw( w ) = w ), 
Lemma 5.1 and Lemma 5.2 of KT cannot be used, which complicates matters 
considerably. 

Throughout this section, we make the assumption that {X{\ and {Wt} 
are independent and, using this assumption, we are able to obtain results 
which are of interest in the general context of nonparametric estimation of 
nonstationary processes. In Section 4, we allow for dependence, but put re- 
strictions on gw, and some parts of the results obtained in this section are 
extended. Moreover, our findings in Section 4 highlight the fact that the 
actual dependence occurring in cointegration models disappears asymptot- 
ically. In this way, results in this section are also relevant to cointegration 
models. Furthermore, they may serve as a starting point for deriving asymp- 
totic results for the dependent case without the restrictions on gw which 
are imposed in Section 4. We believe that letting W[ = gw {Xt, ■ ■ ■ , X t _ p , W t ) 
for some fixed p, where {Wt} is a Markov process, independent of {Xt} and 
such that {W[} is stationary, may be a possible way to proceed. 

We start by expressing f(x) — f(x) in the ^-notation of (2.11), and this 
is done by rewriting the numerator of f{x) of (1.1) as 

z t = gw(W t ) + (f(x t ) - f(x)) + f(x), 

Z t K x , h {X t ) = g h {X t ,W t ) +MXt)K x , h (X t ) + f(x)K Xjh (X t ), 

where g h (z,u) = g w (u) ■ K Xjh (z) and ip x (y) = f(y) - f(x). By the definition 
of f(x), this gives 

f(x) - f(x) = S-\K x , h ){S n {g h ) + S n (ip x ■ K x ^ h )}. 

The last term on the right-hand side represents the bias. It is a stochastic 
quantity and we want to replace it by a deterministic bias term. Let 

def ^slK Xih ^x dcf T , , v 

a h = 7 r> °h = 1 K xh {1px ~ a h ). 

Then 

f(x) - f{x) -a h = S~ 1 (K xA ){S n (g h ) + S n (tp x ■ K x>h ) - a h S n {K x ^ h )} 
= S-\K x , h ){S n (g h ) + S n (b h )}. 

It follows that 

(3.2) h^S^iK^ifix) - f{x) - a h } = Al h + A* 
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where 

A 1 n , h = S-V\K x , h )h 1 / 2 S n (g h ), 

K,h = {pc{x)}- ll2 Tc 1,2 {n)h}/ 2 S n {b h ), 

p c {x) = Tc l (n)S n {K x ^ h ) 

and where C is a purely auxiliary small set. Replacing P£ with / in the 
proof of Theorem 5.4 in KT, then using B1-B3 and condition (2.13), we 
have A^ . = Op(l) and by KT (the second part of Theorem 5.3 and also 
the proof of Theorem 5.4), pc{x) =Pc( x ) + Cp(l) since Tr s bh = 0. 

By (3.2), the above arguments show that a central limit theorem for f(x) 
follows from a central limit theorem for A* hn . We continue the proof of the 

asymptotic properties of / by formulating a general nonparametric CLT. 



3.1. A nonparametric CLT for null recurrent processes. Assume that 
{X{\ is a general Markov chain [e.g., it could be identified with the com- 
pound chain {(X t ,Wt)} or with just one of the components] which satisfies 
the minorization condition (2.2) and the tail condition (2.13). Let (assuming 
first- and second-order moments exist) 

TO 

Uo = U Q (g h ) = J29h(X t ), fj,(g h ) =WJ{g h \ 

f=0 



a 2 (g h )=EU\g h )-^(g h ), 

where gh is a real- valued function defined on E for all h > and r = tq 
is defined as in (2.8). Note that with the function used in this paper, 
the random variables Ui(gh), ^2(5/1), ... in the decomposition (2.11) are 
independent so that in the notation of equation (4.4) of KT, cf 2 (gh) = o~ 2 {gh)- 
Consider the following conditions, where — 00 < /z, // < 00, < cr, a 1 < 00, 
v & [0, 1], m > 2, e > 0, < d m ,d' m < 00, (3 is defined in (2.13), A is an initial 
measure and h J 0. 



CI 
C2 
C3 
C4 
C5 
C6 
C7 



fi(g h ) =ii + O(l), n{\g h \) + 0(1). 
ha 2 (g h )=a 2 + 0(l). 
ha 2 (\g h \)=a' 2 + 0(l). 
E\U(g h )-v(g h )\ 2m <d m h- 2m + v . 

nu{\g h \)-n{\g h \)\ 2m <d' m h- 2 ™+\ 

/ i - 1 «n' 3 ' 5 —, 5 m = ^. 

3ffo : h\g h \ < g and F x (U (g ) < 00) = 1. 



The following theorem is essentially a translation of a CLT result in KT. 
It will be used to prove the main CLT results of the present paper. 
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Theorem 3.1. Let C be a small set. Assume that the tail condition 
(2.13) and C1-C6 hold with /i = for an m > 2 and a v £ [0,1]. Then for 
any initial measure A for Xq such that C7 holds, 

/ J y2 r -i/2 (n){5n(a j_ Tc(n)w -i (c) ^ ( ^ J} ^^ (0)a 2 w -i (c , )) 

Proof. The proof is essentially based on KT (Theorem 4.2). Since gh is 
a function of one variable, the conditions in that theorem simplify. Clearly, 
conditions A0-A2 of Theorem 4.2 of KT follow directly from CI— C3. In 
conditions C4 and C5 the quantity v is allowed to vary everywhere in [0, 1], 
whereas in conditions A3 and A4 of KT, v can only take the values and 1. 
However, this extension is allowed by a trivial modification of the first part of 
the proof of Theorem 4.1 of KT. Condition A5 of Theorem 4.2 of KT follows 
straightforwardly from C7 by reasoning as in the proof of Theorems 5.1 
and 5.3 of KT. □ 



Before we can employ Theorem 3.1, we need to analyze the regeneration 
structure of {(Xt,Wt)} more carefully. This is done in a series of lemmas in 
Sections 3.2-3.8. We believe that these results are of independent interest 
and that they are potentially useful in other situations. Our main result is 
stated in Section 3.9. 

3.2. Decomposition of S n (g). We assume that the compound chain 
{(Xt,Wt)} satisfies (2.2) so that it can be extended by the split chain 
method, with {(X t ,Wt,Y t )} being a split chain. Note that if {X t } and {Wt} 
separately satisfy the minorization inequality (2.2), it is not obvious that 
the compound chain {(X t ,Wt)} will. However, if {X t } and {Wt} are inde- 
pendent, then it is trivial to verify (2.2), as is shown at the beginning of 
Section 3.3. Let 

T k = mf{t>T k _ 1 :Y t = l}, fc>0,r_i = -l. 

Then the sequence {r k } represents the regeneration times for the compound 
process. The basic decomposition, (2.11), with g = gh defined at the begin- 
ning of this section, gives 

T(n) 

(3.3) S n (g) = U (g) + ]T U k (g) + U {n) (g), T(n) = sup{ k : r k < n} V 0, 
fc=i 

where 

£ g(X t ,W t ), fovk>0, 
Uk{g) = < n 

g(x t ,w t ), for & = 

t t = r T(n)+ 1 
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According to the general theory, the variables {(Uk{g), {tu — Tk-i)), k > 1} 
are i.i.d. We denote by U = U(g) a random variable having the common 
marginal distribution of the U^s and write fj,(g) = KU = E u lIo(g), cr 2 (g) = 
Vax(U) = Var i/ (C/o(fi')) = K l/ UQ(g) — n 2 {g), where v refers to the compound 
chain {(X t ,W t )}. 

Our first problem is to find conditions which ensure that and o~ 2 (g) 

are finite. Again, by reference to the general theory [cf. Appendix A, (A. 11) 
and (A. 12)], we have that, with s referring to the compound chain, 

(3.4) n(g)=TT s g, a 2 {g) = 7r s g 2 + 2n s I g H G s , u g - 7T 2 g, 

where H = P — s <S> v and G S)V is defined as in (2.9). 

The conditions ensuring a 2 (g) < oo are not evident from (3.4) if we want 
to avoid the relatively strong restriction that gw is a small function. If 
gwi w ) = w -> then requiring gw to be small is roughly equivalent to 0-mixing, 
which is not satisfied for, say, an autoregressive process. The problem is 
linked to the term G s ^ v . In fact, we also need to demonstrate the existence 
of higher moments and to verify conditions connected to the bandwidth as 
seen in C1-C7. 

3.3. (3 -null recurrence for the compound process. Let P denote the tran- 
sition probability for the Markov process {(Xt,Wt)}- We label quantities 
associated with {Xt} by 1 and with {Wt} by 2. The transition probability 
P satisfies (2.2) when Pj and P2 do since 

(3.5) P = Pi <g> P 2 > (si ® s 2 ) 8) (v\ ®u 2 ) = s®v. 
Condition (3.5) will be assumed to hold in the following. 

Lemma 3.1. Assume that {Xt} and {Wt} are independent, that the tail 
condition (2.13) holds for {Xt} and that {Wt} is ergodic. Then the compound 
process {(Xt,Wt)} is (3-null recurrent, that is, the tail condition holds for 
the compound process. 

Proof. Let C± and C 2 be small sets and let v = v\ % v 2 - Then 

( n ~\ n 

EJj2l Cl (Xt)lc 2 (W t ) =£(^1^2^) 



(3.6) 



.t=o ) t=o 



t=0 t=0 

where b t = v 2 P\\c % —^2^02 an d where tt 2 is the stationary measure for {Wj}. 
Since {Wt} is ergodic, bt = 0(1). Since {X t } is /3-null [cf. KT, Lemma 3.1 
and formulas (3.12) and (3.13)], we have that 

n 

J^flPflCx = (7T sl l Ol )'0i(n)(l + a n ), i{j 1 {n)=n (S L Sl (n), a n = 0(l). 
t=o 
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By (2.12), the conclusion of the lemma follows if we can show that the second 
term of (3.6) is 0(rpi(n)). Let ip M = sup t < M ipi(t), A = sup t \a t \, c\ = ir Sl lo xl 
B = sup t \b t \ and B^ = sup i>A/ \b t \. Then for all M > 0, 

(3.7) —— <ci\B — — VB y >(l + \a n \)\. 

Letting n tend to infinity and then letting M tend to infinity, we find that 
the left-hand side of (3.7) is 0(1) with respect to n. □ 

3.4. Refinement of the decomposition structure. We extend both chains 
with the split chain method and write {(Xt,Y^-)} and {(Wt,Y 2 )} . Due to 
independence, {(X t ,Wt,Y t )} is the split chain for the compound process 
{(X u W t )}, where Y t = Y t 1 Y t 2 (cf. [24], (4.17), page 62). Thus, 

(3.8) r k = inf{t > t*_i : Y^ = Y t 2 = 1}, k > 0, r_i = -1. 

We shall now look more closely at the decomposition structure and try, to 
some extent, to reduce it to the marginal decomposition of the {X t }-process, 
that is, the regenerations defined by {t^}, 

(3.9) r, 1 =inf{t>r, 1 _ 1 :y i 1 = l}, fc>0,r^ = -l, 
which defines the X-partition. Let 



(3.10) V J = V j (g)= J2 9(Xt,W t ), s>0. 

t=rj_ 1+ l 

Although the Vj's are neither unconditionally nor conditionally independent, 
they will be useful. By (3.8), we see that the regeneration times for the 
compound chain are also regeneration times for {X}. Hence, following the 
regeneration times (3.9) for the X-process, we recover all of the simultaneous 
regeneration times given by (3.8). The gaps between successive simultaneous 
regeneration times define a subdivision of each Uy- into Vj's and this refines 
the decomposition given by (3.3). Let 

% = mf{j > T fe _! : Y% = 1} for k > 0, T_i = -1, T d ^ %. 



3 



Then tj~ = t\ , which gives 



rl 1 



U k = J2 9(X t ,W t )= E 9(X t ,W t )= J2 V i 

t=rl i=r fe _ 1 +lt= T i +i j=7i_!+l 

and in particular for k = 0, 

T 

(3.11) U = Y V r 

3=0 
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The number of subblocks inside a large block is distributed as the re- 
currence time for the ergodic process {W r i}. Comparing this distribution 

with r and r , it is evident that a block which is quite large is partitioned 
into relatively few sub-blocks. The advantage of this construction is that the 
subblocks are defined by the regeneration times for the X-process and the 
X-part of gh is marginally a small function. 

3.5. The embedded process. The following lemma proves that embedding 
the {Xt }-regeneration times into {Wt} and extending to a split chain are 
essentially commutative operations. 

Lemma 3.2. The process {W T i,k > 0} is a Markov process with tran- 

h 

sition probability P = P2<& Vl , where Q Vl = X^oWC^l ~~ s i ® u i) S \}P2 ■ 
Moreover, 

(3-12) P >s®v, 

with (s,u) = (s2, z^2^i/i )• Let A = Ai <8> A2 be the initial measure for {(Xt, Wt)}- 
Let {W k } = {{W,, Y,)} be the split chain generated by P and (s,u) and let 
{W t1 } = {(W t1 Y t \)}. Then 

(3-13) {W T i}±{W k } 

when the initial measure for W Q is A = A = A2$Ai • particular, let T de- 
note the first regeneration time for {W k }. Then the occupation time formula 
is given by 



7- T ( AG 1a, in general, 

(3.14) E^1a(^)=E x ^1a^ = L?U ifX 2 = W2, 



k=0 k=0 

where G s v = Y.T=v{P ~s®vY. 



7t S2 1a, if\ = v, 



The proof is given in Appendix B. 

Intuitively, changing the time parameter from {k} to {t^} in the ergodic 
process {Wt} should decrease the amount of dependence, and this is the 
content of our next result. More specifically, we obtain that the rate of 
convergence of the transition probability toward the stationary measure is 
at least as good for the {VF fc }-process as for the {Wj}-process. 



Lemma 3.3. Suppose that {Wt} is geometric ergodic. Then this is also 
true for {W k }. If {Wt} is strongly mixing with mixing rate defined by a = 
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{ay}, then {W k } is strongly mixing with mixing rate a, which is equal to or 
faster than a. In particular, for an integer p>0, 

oo 

(3.15) £Va/<oo E^ 2 T p+1 <oo. 
t=\ 

The proof is given in Appendix B. 

3.6. Moment bounds. Our nonparametric CLT requires bounds for the 
moments of U(g) given by C4 and C5. We first need to find upper bounds for 
moments of U{g) corresponding to (3.11) and related quantities. We assume 
that 

(3.16) g(x,w) = {gx® gw)(x,w) = gx{x)gw {w). 

Our method is to use a representation of U (g) as a partial sum of V's, these 
variables being defined by the regeneration of {X t }. 

In the following, Hj = Pj — Sj (g> Vj for j = 1 , 2 and as before, H = P — s®v. 
Also, recall that I g is defined by I g (x,A) =g(x)lA(x). 

Theorem 3.2. Letm>l and Vj be defined by (3.10) and (3.16). Then 
T 

(3.17) E„X; \Vj\ m < 7T S2 \g w \ m EU m (\g x \). 

3=0 

For all p> and 5 £ (0, oo) , 

(3.18) E| U(g) \ p < EV ^ ( ^ \Vj ) K s J {1+5) 1^+1 } . 

[j=o J 



The proof will be based on two lemmas. We use the notation Sj = rj — r-_ 1 
for j > and Hj = V V F w ' V T Y ^ . Then Vj is measurable Hj, and 
{% > j} = {T> j} £ Hj-i. By (3.11), Uq = ET=o v A( r > j) and for m > h 
E x {Vpi(T>j)}=E x {l(T>j)E x [VJ n | Hj-!]}. 

The following technical result, which is the first step in the proof of The- 
orem 3.2, uses the independence of {Xt} and {Wj}, together with the re- 
generation property of {X t }. 

Lemma 3.4 [Decoupling]. Let A = Ai <S> A2. Let j > be fixed and let 
{X[} be an independent copy of {Xt} so that {X[} is independent of both 
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{Xt} and {Wt}. Let £\y be a real-valued function defined on Rx {0, 1} and 
for fixed j , let 

4=£ w (W T i_ i+e+1 ,Y?tJ, £>Q,Y^=y 
and let Vjcj be an extension of (3.10), given by 

(3.19) V U = 9x{X t )Cw(W t ,Y^ i ), j>0. 

t=r/_ 1+ l 

Then for m > 1, 

EA^IWj-l}: 

w/iere U {a,g x ) = T,£ =0 9x{X' e )a e and a = {a e } = {a\] 
Proof. Let j > 1. By (3.19), 



E Al C/ m (a, 5x ), /orj = 0, 
^U^gx), forj>l, 



J j— 1 



so that with = min{i > 1 : Y^ 1 = 1}, 



- m 









(- Si <j m 

r T o ^ m 

U=o J 

= E i , 1 C/ m (a, 5 x), 
where we have used the fact that 

C Xl {X T i_ i+e , l<£<8j} = C a {X' e , 1<£<S 1 J = C Ul {X' e , 0<i< tq 1 }, 

with £\ denoting the simultaneous distribution with initial measure A. 
If j = 0, then 



E x {V£\H j -i} = E> 



J29x(X t )£ w (W t ,y) 



i=0 



E Xl U^(a,g x ). 
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□ 

Using the previous lemma, the factorization of g given by (3.16) and a 
general moment formula given in Corollary A.l in Appendix A, we obtain 
a useful exact formula. The notation is in accordance with Theorem A.l in 
Appendix A. We use the index set A™ = {a S M r + : X^=i a j = m i' wnere -A/+ 
is the r-Cartesian product of all strictly positive integers, the multinomial 
coefficient Q = ^0^, A/^ = M x M^ 1 and j {2) = (j 2 , . . . ,j r ) e N r +\ 



Lemma 3.5. Let be defined by (3.10). Let m>l. Then 



T 

' m 



k=0 r=laeA 

where 



(3.20) E,E^ = EE(; E {w&.aHWi*.}, 



f X = J ai HP I oc 2 ■ ■ -R\ r L «rl 

•0(2)," 9 X 1 9 X 1 »X ' 

More generally, we have for A = Ai (8> A 2; wiffo = H^ 1 a and /j^ 



pji f 

2 J] (2 ),a> 



eaE^=e e p e (A-^)c/fa®/r 

(3.21) ^=o r=iaeA- v ieA^+ 



Remark 3.1. If A = u, then 

AG P 2 fZ = vG P 2 Pi 1 fJ V a = ^s 2 Pi 1+1 f^. a = KsJV a , 



oo oo 

E *f*a = ^ E H lfj\v,« = ^Jj^,a- 
31=0 ji=0 

Thus, (3.21) reduces to (3.20) when A = v. 

Proof of Lemma 3.5. We rewrite the first term on the left-hand side 
of (3.20) using the fact that 1(T >k) = 1(T >k — l)l(Y^ = 0), so that 



r 
I 

k=0 k=l 



^ E v ™ = E ^ v o m ) + E Micr > - 1)^(^1^ = o) i 
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Let Vje^ be defined by (3.19), where (,w(w, y) = gw(w){l — y). By Lemma 3.4 
and its proof, it is seen that the conditional mean given TCk-i only involves 
the regeneration of {Xt} and we can therefore use Appendix A (and more 
specifically Corollary A.l) to obtain for k > 1, 

E u [V k m l(Y* =0)|W fc _i] 



E„[V$|74_i] 



E E 

r=la£A!T 



E W&} 



n^(^_ 1+ti+1 ; 



i=l 



1(^ = 0), 



where t 4 = ji + • • • + fr. Let Q k = JP* V V V . Then by condi 
tioning with respect to Gk—li we find that 



eJ i(r>ife-i) 



-ti+i> 



Lt=l 



1(^=0) 



= E I/ {l(r>fc-l)fT 2 /X(W T i_ i )} 



Hence, 
r 



E v E *fc m = E > k - 1)E V [V^1(Y^ = 0) | H k -i]} 



k=l 

(3.22) 



k=l 
m 



= E E 

r=loeAjr 
m 

= E E 

Similarly, we find that 



J2 WfoKfeHTzk-vnif^w^^ 

3^o,+ 



>fc=l 



(3.23) 



w=E E 

r=laGA" 



E w*JW& 

and by combining (3.22) and (3.23) and using ir S2 H2 = tt S2 — 1/2, we get 



r m 

E,E^ m = E E 

fc=0 r=la£A? 

Now, n s j] v a = TT s J^ !)t0 , and 



E = E j# flf^ • • • flf^ 1 = 



ji=0 



ii=o 



9 X 



'3(2), 
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and thus (3.20) is proved. 

The proof of (3.21) is similar. Instead of (3.22), we obtain 




r=loeA™ v 7 j6A/q + 

and (3.23) is changed to 



(3.25) 




Combining (3.24) and (3.25) and using XG g v B.2 = XQ s v ?2 — 1^2, we obtain 
(3.21). □ 

Remark 3.2. If m = 1, then by (3.20), fj,(g) = E u Y,j=o v j = U si 9x} 
{^s 2 9w}, which, using (3.11), is consistent with (3.4). 

Remark 3.3. If m = 2, then 

T 00 

E -E V? = K^f}K^} + 2Y d {^sJ gx H{l gx \}{'K ai I gw P l I aw \}. 

3=0 l=\ 

Remark 3.4. By (3.11) and (3.21), we find that for general A = Ai <8> A 2 , 
9 = 9x® 9w > we have 

00 00 
®Mg) = £(A - v)(P(gx ® Pigw) + ]T{^<7x}{AG P^gw}. 

3=0 j=0 

If gx is small, A2 = ^2 and sup^^G P| + jffw | < 00, then E\Uo(g) is fi- 
nite. More generally, taking p = 1 + S, f = P2 +1 \gw\ and A = A2 = 7r 2 in 
Lemma A. 2, we have that 

*2Q^I {l+ ^Pt l \gw\ < c2Ki {l+r,s) Z 1+2S {^2 S /{1+v5) \gw\ {1+mr > s) }, 

with r\ E (0, 1) and 5 > arbitrary. The result can now be combined with 
Lemma 3.3, which ensures the existence of moments of T under appropriate 
mixing conditions. 



NONLINEAR COINTEGRATION 



19 



Proof of Theorem 3.2. We first prove (3.17). Assume that gx > 0. 
By Cauchy-Schwarz, recalling that Y^j=i a j = m > we have 



K2/ 



(3.26) 



m 



r=l 



Inserting (3.26) into (3.20), we obtain 

T m , v 



k=0 



r=la£A r r 



J(2)i 



w S2 \ 9w rEU m (g x ), 



from which (3.17) follows trivially. 

To prove (3.18), let r = 1 + 5 and q = 1 + (5 -1 . Then E I/ |C/o(^)| p 
E,|E?=o^l p and 



E, 



r 



E^ 

3=0 



<E^ max |ViHT+l| p <E?/ r max \V i \ pr K /q \^ + M Pq 



r 



<Ey r ^i^rEy g ir+ip. 

i=o 



□ 



3.7. Moment bounds ofU(gh) expressed in terms of bandwidth. The fol- 
lowing results describe how higher-order moments of U behave as functions 
of the bandwidth. This is what is needed to apply C4 and C5 in Theorem 3.1. 

Theorem 3.3. Let gx = gx h = Kx h an d assume that conditions B2, 
B3 and (3.16) hold. Then for all integers k,m>l, 



(3.27) 

where 



nU(g h )\ 2m <d m , k h- 2m+1 ^ k+1 \ 



i m ,/={vrl/( fc+1 )^ (fc+1) }E^( fc + 1 )|T+l| 2m ( fc +V fc ){4 1 ^ (fc+1) } 



and the sequence of constants {d' m } is only dependent on M x and swp u K(u). 



Proof. By (3.18), with p = 2m and 5 = k, we have 



(3.28) W 2m {g h )<¥}J( k+l) 



7 



|y-.|2j7l(fc+l) 

3=0 



]gfc/(fc+l) yj- _|_ 1 |2m((fc+l)/fc) 
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From (3.17), we have 

(3.29) E^^ 2m(fc+1) <7r,J W | 2m(fc+1) E^ 2m ( fc+1 )(|^|)^ 

j=0 

and by KT [Lemma 5.2 with £o = 1 an d Q2 replaced by Q\ = {/: (E,£) 
(R,B(M))}, where B(M) is the class of all Borel sets on R], 

(3.30) EU 2 ^ k+1 \\g x>h \) < d' 2m h~ 2m ^ +1 . 

In the proof of that lemma, it is also shown that the sequence of constants 
{d' m } is only dependent on M x and sup u K(u). 

Inserting (3.29) and (3.30) into (3.28), we get (3.27). □ 

3.8. Asymptotic variance. Exact information about the first order prop- 
erties of the asymptotic variance is important (cf. C2 and C3). Such infor- 
mation is contained in the next result, which is the analogue of Lemma 5.1 
of KT. Our method of proof uses a truncation technique based on the notion 
of a generalized autocovariance function. We believe the latter concept to 
be of some independent interest. 

Theorem 3.4. Assume that the process {Wt} is an irreducible, ergodic, 
strongly a-mixing process which satisfies (2.2) and has mixing rate satisfying 
J2e^ 2 ^ k ^ vl ot£ < 00, iT2gw = and 7r2|<7wH 2 ^ +1 ^ < 00 f or some integer k > 1. 

Assume, in addition, that gx,h = K Xt h and that conditions B2-B4 hold. 
Then if n(gw) = 0, we have, as h j 0, 

(i) ha 2 (g x ,h <S> 9w) = Ps 1 { x )\\ K \\l^s 2 9w + 

(ii) ho- 2 (\g x ,h®9w\) = ha 2 (g x ,h O gw) + 0(1) ■ 

In the proof of Theorem 3.4, we need some results of a more general nature 
concerning generalized autocovariances. These are formulated in Lemmas 3.6 
and 3.7 below, for a general <j>- irreducible, aperiodic, Harris recurrent Markov 
chain {X t } with transition function P satisfying (2.2) and with the S n {g)- 
decomposition, as in (2.11). The next step is Lemma 3.8, where we apply 
Lemmas 3.6 and 3.7 to our Markov chain {(Xt,Wt)} [with a slight conflict 
of notation, taking X t = (X t ,Wt)]- 

We begin by extending the notion of a cross-covariance function, as de- 
fined for ergodic processes. 

Definition 3.1. Let g, f e L 1 (7r s ) nL 2 (7r s ). The generalized covariance 
and cross-covariance function is defined by 

( 7T s I go fo + H g fJ-f(l - tt s s 2 ), when t = 0, 

(3.31) lgJ {l) = I (pgPt^fo, when t > 1, 

\lf, g {-£), when^<0, 
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where ip g = f 7r s I g P — fi g f and 7 5 = f 7 9 , g - Mean centering a function / with 

dcf 

s/i/ produces a function denoted by /o, that is, /o = f — s/if. 

Note that when fi g = fif = 0, the generalized cross-covariance function is 
equal to the ordinary one, apart from the constant c n = its, that is, j g j = 
c~ 1 7 3 j, where 7 j denotes the stationary covariance function. 

Lemma 3.6. Assume that \g\ is small. Then o~ 2 (g) = J2eiL-oo 7s W- 

Proof. We have, by (A. 12) in Appendix A and by (3.4), that 

^{9) = 7 S (0) + 2<p g G SjU g . 

Iterating G s , v = 1+ {P-s®v)G s , u , we get G s , v = G^ + P n G s , u - G^s® tt s 
with G^ = X)"=o P l - P re_ an d post-multiplying this equation by ip g and go, 
respectively, gives (p g G StU g = Yd=ilg(£) + VgP n ^-, where ip = f G S)U g . By 
Nummelin ([24], Theorem 6.7, page 109), and since \\(p g \\ < 2/xui, |<7o| is small 
and tft is bounded, we find that (p g P n ip = 0(1), from which the result follows. 
□ 

Remark 3.5. The formula in Lemma 3.6 can be viewed as a general- 
ization of the formula Xw{n~ 1 / 2 £" =0 ^j) = YT=-oo Cov(X t ,X t _ e ) + 0(1) 
in the case where {X{\ is a stationary process with an absolutely summable 
covariance function. 

It is necessary to weaken the assumption of smallness in Lemma 3.6. 

Lemma 3.7. Assume that 

(i) g e L 1 (ti s ) n L 2 (ir s ), Tr s I\ g \PG S}U \g\ < 00. 

If there is an approximating sequence {g n }, in the sense that \g n \ is a small 
function, \g n \ < \g\ and g n — >g a.s. \tt s ], then for each i, 

n 

0i) 7snW^7sW; 
(iii) a 2 ( 5 )=hm n E^_oo7s„W- 
Suppose that 

where J2iLi sup n \d n! e\ < 00, rj n = n + 0(1) and J2eiLi °t < °°- Then 
(v) o- 2 (g) = J2iiL-oc 7s an d if r l = Q> the convergence is absolute. 
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Proof. Let {g n } be an approximating sequence which satisfies the con- 
ditions in the lemma. First, we prove that 

(3.32) limir s Ig n PG S)U g n = ir s I g PG StV g. 

Let i n = PG S)V g n , £ = PG, %u g and £ = PG s , v \g\ so that |£ n | < £ - We must 
show that £ n — = PG sv g a.s. [ir s ]. Let L> be the set of points where 
g n fails to converge toward g. Then it s {GId > 0} = with G = YlltLoP 
since tt s is a maximal irreducible measure. Hence, it s {PG SjV 1d > 0} = 0. The 
rest of the proof of (3.32) follows directly from the dominated convergence 
theorem since ir s I gn PG S:U g n = ir s I gn Cn = K s {g n £ n ), where \g n ■ £ n | < {\g\ ■ £} 
and (g n -i)^{g- \) a.s. [ir s ]. 

By Lemma 3.6 and (3.32), statement (iii) holds. It is obvious that (ii) 
holds and if (iv) is true, then X^Li7g n W — > Y^xlg{^)i by the dominated 
convergence theorem. Together with (iii), we can conclude that (v) is true. 
□ 

In the next lemma, we return to the Markov chain {(Xt,Wt)} and let it 
play the role of the general Markov chain in Lemmas 3.6 and 3.7. 

Lemma 3.8. Assume that g(x,w) = g\(x)g-2(w), g\ is small and that 
{Wt} satisfies the conditions stated in Theorem 3.4. Then a 2 (g) = 

Proof. Our proof is based on Lemma 3.7. We must show that Lemma 3.7(i) 
and Lemma 3.7(iv) are satisfied. We do not assume that n g2 = 0. 
By (3.28), (3.29) with m = 1 and the smallness of gi, we have 

7r s (g 2 ) + 27r s I g PG s>u g <EU 2 (\g\) 

(3 33) 

<co7r 2 1/(fe+1) | 52 | 2 ( fe+1 )E^( fe+1 )r( 2fe + 2 )/ fe . 

The quantity vr2 1^2 1 2( ' fc+1 - ) is finite by assumption. We have that E u T^ 2k+2 ^ k = 
E^T ( 2fe + 2 )/ fc , thus the right-hand side of (3.33) is finite if E n2 T i k + 2 )/ k < oo 

(cf. [4, 5]). By Lemma 3.3, this is true if YT=i £ [2/k]vl a £ < oo and is thus 
satisfied by the mixing assumption on {Wi}. Hence, Lemma 3.7(i) holds. 

We begin the next step by establishing an approximating sequence for 
g = gi®g2- By Nummelin ([24], Corollary 2.1, page 24), there exists an 
increasing sequence of small sets C' n such that UnLi C'n = ^2, where E = 
Ei x E 2 . Define C n = (\g 2 \ <n)t~) C' n . Let 

(3-34) 3n = 2l®<?2> <?2=<?2lc n - 

Then \g n \ is small for all n and \g n \ | \g\ a.s. [tt s ]. 
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We express 7gi®g£ in terms of g\ and g 2 using Definition 3.1. At the same 
time, we insert A| = {v^P^ ~~ 7r 2), = ir 2 s, 7 S2 = c 7T j g2 and ~p g2 = c 7T fi 92 . 
This gives, for i > 0, 

(3.35) + ^K/j.^SioK^W -T^A^} 
+ ^7 91 W{A^ 1 s 2 + ^ 2 s 2 }. 

By the mixing property of {W^}, we find that j g2 is absolutely summable 
(cf. [14], Corollary A. 2, page 278). Moreover, since the recurrence time for 
{Wt} has a finite second-order moment, {Wt} is ergodic of degree 3 as a 
Markov chain (cf. [24], page 84) and that implies the finiteness of Y^&Li ^11 ^2 II 
(cf. [24], Theorem 6.13, page 118). By Lemma 3.6, 7 9l is summable. It is now 
easy to verify that each of the four terms of "fg n (£) given by (3.35) satisfies 
Lemma 3.7(iv). Hence, by Lemma 3.7(v), the proof is finished. □ 

Proof of Theorem 3.4. By Lemma 3.8 and Definition 3.1, since 
fi 9w = 0, we have 

a 2 {gx,h ® 9w) = J2{K s jK x , h Pl K x,h}lgw{Z)- 

l 

For I > 0, hir Sl I KxM PiK Xth = 0(1), by B 4 [cf. KT, proof of part (b) of 
Lemma 5.1]. Since 

\ir Sl I Kx h P(K X)h \ < Jp s (x + hu)K(u)P e (x + hu,M x (h)) du < c 

and J2 hgw(^)\ ^ s finite, we can apply the dominated convergence theorem, 
that is, 

\imha 2 (g x ,h®gw) = ^\mi{hTT Sl I K h PiK X)h }- 1gw {i) 

(3.36) 1 

= lim{/i7r Sl ^ } 7 (0) =p Sl {x)\\K\\ln S2 gl / . 

The proof of Theorem 3.4(h) follows in a similar way, using Lemma 3.8 
and (3.35) with g 2 = gw- F° r I = 0, Definition 3.1 must be used. □ 

3.9. Main result. 

Theorem 3.5. Assume that {X t } and {Wt} are independent, recurrent 
Markov chains in (3.1) and that {Wt} is an irreducible, ergodic, strongly a- 
mixing process which satisfies (2.2) and has a mixing rate satisfying 
j2 e £[2/k]vi ae < 00) n 2 gw = and iT2\gw\ 2m ^ k+1 ^ < 00 f or some integers k>l 
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and m>2, 1x2 being the invariant probability measure of {Wt}. Moreover, 
assume that B1-B4 hold and that (2.2) and the tail condition (2.13) hold for 

Finally, assume that for some e > 0, 

h- 1 «n^ m ~ e 5 - m ~ 1 

m — l/(k + 1) 

Then for all A = Ai (g> 112, we have 

1 /2 

lh n jrK x , hn {X t )\ {/(*) - f(x) - njK ^ x \ ^M(0,a 2 w \\K\\ 2 ). 

If the density p s and the function f possess continuous derivatives of sec- 
ond order, then the bias term tt s Ik x h ^x/^sKx,h n * s negligible when h' 1 S> 

Proof. We use Theorem 3.1 on the compound chain {(Xt,Wt)}- As 
noted at the beginning of Section 3, it is enough to prove that 

A„ )/ln = S-^ 2 (K X)hn )hi/ 2 S n (K Xthn <8> gw) -±*Af(0,\\K\\ 2 2 ir 2 g$y). 

Recall that for C = C\ x C2, Cj G £j, i = 1, 2, 

n 

T c (n) = Y / lc 1 (Xt)lc 2 (W t ) 
t=o 

represents the number of visits of {(X t , Wt)} to C up to time n. We choose 
C\ and C2 so that both sets are small. Then by KT (the second part of 
Theorem 5.3), using B2-B4 and the tail condition (2.13), we have 

^ 1 \ def S n {K x h ) . . 

PCifr) = = y^=poi(ac) + Op(l), 

with E = Ei x E2 and where pc a (x) = p si (x)/tt Si 1c± . By KT (Remark 3.5), 

= + °W = ^ + °« ^ 

We can write 

An^^wl-^^V'^T- 1 / 2 ^)^^^)} 
UC1XE2W J 

say, where g h {z,w) = K x ^ h {z)g w (w) and A„ i/ln = {p^I(x)tt 2 Ic 2 } + Cp(l)- 
Hence, it is enough to prove that 

(3-37) A^^A/( , m (x)||K|| 2 ^f|). 
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By B3 and Bochner's theorem, CI is satisfied. From Theorem 3.3 and 
Theorem 3.4, conditions C2-C5 are satisfied with v = l/(k + 1). 

It only remains to verify C7. Let go = cq1j^ x \gw\-, where cq is an appro- 
priate constant. Then |%/J< go- We must prove that F\(Uo(go) < 00) = I, 
with [cf. (3.11)] U(go) = J2j=o^j(do)- ^ ut ' * ms ^ s satisfied if E\Uo(go) < 00. 
By Remark 3.4, this is true if 

(3.38) E, 2 |T 1+25 \Tr 2 \gw\ {1+Smr > S) < 00 

for some 5 > and 7/ G (0, 1). Let k > 1 be fixed and ir2\gw\ 2 ^ k+1 ^ < 00. By 
Lemma 3.3, (3.38) is satisfied if 



This is true if 



1 ~\~ S 2 
— — <2(k + l), 1 + 25 < 2 + -. 
rjO k 



2A; + 1 /c <5(2/c + l) 

Thus, (3.38) holds. 

Hence, by Theorem 3.1, A° ftn -^jV(0, cr^) and by Theorem 3.4, 

°"c = {ns^c 1 }~ 1 {n S 2 1 C2}~ 1 Psx{ x )\\ K \&s 2 gw- 
It follows that (3.37) holds. □ 

Remark 3.6. If k = 1, then we require that the residual {Wi}-process 
have a finite eighth-order moment, together with a mixing rate which satis- 
fies J2i^ 2a e < 00 • If) on the other hand, all moments of the residual process 
are finite, then it is enough for there to exist a 5 > such that 

4. Some extensions to the dependent case. In linear cointegration the- 
ory, the stationary process {Wt} resulting from a linear cointegration rela- 
tionship Wt = Zt — aXt, say, will generally be dependent on {X{\. From this 
point of view, it is of interest to extend the theory of Section 3. We will do 
this by assuming that {{Xt, Wt)} is a Markov chain in (3.1) and specifying a 
dependence relation between them for which the asymptotic theory holds. In 
this situation, we will prove that the compound process {(Xt, Wt)} is /3-null 
recurrent, as was done in the previous section. But, unlike Section 3, we es- 
sentially assume that the function (u, w) 1— > K Xj h(u)gw(u,w) is small. In this 
way, it is guaranteed that the necessary moment requirements are satisfied. 
In addition, we need existence and smoothness of an invariant measure for 
the compound chain, together with additional conditions which control the 
bias. 
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4.1. Conditional expectation. The restriction on the type of dependence 
allowed between {X{\ and {Wt} will be formulated in terms of the condi- 
tional expectation of Wt with respect to X t . Let {(X t ,Wt)} be Harris null 
recurrent with state space (E,£) = (E\ x E2,£i ^£2), invariant measure n s 
and maximal irreducibility measure <f>. Assume that 

^s^CixE 2 < 00 f° r some C\ G £\ 

and let 

(4.1) Q def7T s / ClxE2 



7T.1 



Cl Xi?2 

so that Q is a probability measure on (E 1 ,£') = (E[ x E2,£\®£i) with E[ = 
C\ and £[ = £\C\C\. Here, IdxE 2 is defined as in Section 2. A generic point 
in E' is denoted by (y,w). In this setting, we specialize further, assuming 
that 

(4.2) E = E 1 x E 2 <^MxR and £ C S(M 2 ) 

and that Q is a bivariate distribution on ,6(IR 2 ) which can be identified by a 
stochastic vector (X, W). The generalized conditional expectation /ivMJf [#] 
is the conditional expectation of g(X, W) given X = y, that is, 

(4.3) n w]x [g} d ^E Q [g(X,W)\X = y], g G L X {E' ,£>,Q). 

The following definition of a generalized conditional variance is an imme- 
diate consequence of (4.3): 

(4-4) ^{y)^ mx [g\y)-^ x [g]{y). 

From (4.1), it follows that Q is independent of the specific normalization 
which identifies a particular function of s. Hence, fiyy\X is indepen- 

dent of a specific atom and also of the assumption that mo = 1, which is 
important in applications to real data. Suppose that C[ is an alternative to 
C\ and let fJ-'y^ix ^ e the alternative conditional expectation. Then 

^W|x[0]lcincj =/ i W|x[f]lcinc(- 

If ir s is absolutely continuous with respect to two-dimensional Lebesgue 
measure and has density p s , then with 



P's (v) = J Ps(y,w) dw 

and 

p W \x(w 1 y) = ^rr 1 ^ (y) > °)' »)^i x ^, 

1 Pt (y) 
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we have 

Pw\x [g\ (y) = J g(y, w)pw\x ( w I y) dw. 

In the remainder of this section, we assume that the state space is given 
as in (4.2). By x, we denote a fixed point in E\ and by Co, the support 

of gw Let M x (h)=Af x (h) xC o ,0<h<l, M = M (1), M X =M X (1) 
and M{x} = M x {0), M x being defined at the beginning of Section 2.1. If 
A is the initial measure for the compound chain, then \w\x{' I x o) is the 
conditional distribution of Wq, conditional on Xq = xq, and Xx = A( • x E2) 
is the marginal initial measure for Xo. 

4.2. Conditions and dependence. In order to extend our asymptotic re- 
sult to the dependent case, we will apply the conditions stated below. 

The first set of conditions is related to the Markovian structure of the 
compound chain. Basically, we assume that the {Xt}-process also determines 
the /3-null structure for the compound process. This holds in the independent 
case (cf. Lemma 3.1). 

Di (i) The process {(Xt,Wt)} is a (^-irreducible, Harris recurrent Markov 
chain with state space given by (4.2) and transition probability func- 
tion P. 

(ii) The minorization inequality (2.2) with (s, v) is satisfied with a cor- 
responding invariant measure ir s . 
T>2 (i) The marginal process {Xf} is a 0i-irreducible, Harris recurrent 
Markov chain on (E±,£i) with transition probability function P±. 
(ii) The minorization inequality (2.2) is satisfied with (si,vi). 
(hi) The Markov chain {Xt} is /3-null recurrent. 

(iv) There exists a set C\ G £f such that £ ^ f lcixE 2 G £ + ls vr s -integrable. 
D3 (i) The invariant measure n s has a density, p s , with respect to the 
two-dimensional Lebesgue measure, 
(ii) / p s (x,w) dw > 0. 

(hi) lim^o/ \Ps(x + S,w) - p s (x,w)\dw = 0. 

(iv) The marginal transition probability function P\ is independent of 
any initial distribution A. 
D4 (i) gw is bounded and < / 15^(^)1 dw < 00. 
(ii) The set M x <8) Co is small, 
(hi) Hw\x[gw](x) =0. 
D 5 V{A h } £ £°° : lim^o A h [ : lim M0 / w),A h )\g w \ (w) dw = 0. 

Conditions D1-D3 and D5 are essentially rephrased versions of the con- 
ditions used in Theorem 3.5. Condition D4 introduces stronger restrictions 
on gw- In Section 3, the boundedness and smallness were avoided by means 
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of a truncation technique which fit that situation. It is not obvious how 
to find a similar truncation procedure in the dependent situation. Possibly, 
the concept of asymptotic independence (to be introduced in Definition 4.1) 
could be of use. In a simulation experiment in Section 5 with an unbounded 
gw having noncompact support, we obtain results indicative of the asymp- 
totics being valid under the more general conditions on g\y used in Section 3. 

Condition D^iii), which contains the restriction on the dependence re- 
lationship between {X t } and {Vl^}, at first sight seems very stringent, but 
it will now be shown that it is, in fact, a natural extension of the type of 
dependence that is used in standard linear cointegration theory. Since this 
is important in an econometric interpretation of our results, we will consider 
it in some detail. 

We begin by defining the concept of asymptotic independence in this 
context. 

Definition 4.1. Suppose that {(X t ,Wt)} is a null recurrent Markov 
chain. The two marginal processes {Xt} and {Wt} are asymptotically inde- 
pendent if the invariant measure ir s factors into a product of two measures 
which correspond to the X-component and the VF-component. 

If {Xt} and {Wt} are asymptotically independent, then the conditional 
expectation given by (4.3) reduces to a constant whenever g(y,w) = g{w) 
and D4(iii) follows if this constant is zero. 

It may seem that asymptotic independence is tantamount to requiring in- 
dependence, but this is not the case because having {Xt} nonstationary (and 
null recurrent) and {Wt} stationary is a special situation, where, intuitively, 
the "small" process {Wt} has little influence on the "big" process {Xt} in 
the long term, but allows for dependence for fixed t, as is the case for lin- 
ear cointegration models. This phenomenon is handled more formally in the 
following example, which extends well-known results in linear cointegration 
theory (see, e.g., [15], pages 586-589). 

Example 4.1 (Asymptotic independence). In this example, we prove 
asymptotic independence between a random walk and a stationary autore- 
gressive process, despite the fact that they are linked for each t. Moreover, 
we prove that conditions D1-D5 are satisfied. This means that the common 
invariant measure for these two processes factors as if the processes were 
independent. The processes are given by 

, . X t = X t _ 1 + e t , 

(4.5) 

W t = aW t -i + be t +u t , \a\<l, 

where {et} and {ut} are independent i.i.d. processes with finite third order 
moments and distribution functions F e and F u , respectively. Moreover, we 
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assume that these distribution functions have densities f e and f u , respec- 
tively, with respect to the Lebesgue measure in Br. In addition, we assume 
that both densities are bounded away from zero on some interval [— c, c] 
with c > 0. Let irw denote the stationary measure for {Wt} and pw the 
corresponding density. Likewise, let irx(dy) = dy. 

First, we find the density of the transition probability function for (4.5): 

F x ,w(x, w | x , w ) d = P{X\ < x, W\ < w | X = x , W = w ) 
= P(xo + e\ < x, clwq + be\ +u\<w) 

= J P(xq + ex < x, awo + be + U\ < w \ e\ = e)F e {de) 

l(e < x — xo)F u (w — awo — be)F e (de) 
and 

d d 

fx,w(x,w\ x ,w ) = ——F x ,w{x,w | x ,w ) 
ox ow 

= f e (x - x )f u (w - aw - b(x - Xq)). 

The function fx,w{x,w \xq,w$) is the density of the compound transition 
probability and from the assumption on f e and f u , it follows that 

(4.6) inf fx,w(x, w \ x , w ) > 
(x,w,xo,wo)eC 4 

for some Lebesgue-positive compact set C in K. By (4.6), we can choose an 
atom s <S> v which is equal to a constant times 1q <8> He , where He is the 
restriction of the Lebesgue measure to the set C. In a similar way, we use 
the definitions of {Xt} and {Wt} to get marginal minorization inequalities, 
Pi > Si <S> Vii where P\ corresponds to the X-process and P2 corresponds to 
the VF-process. 

If p s is an invariant density, then p s satisfies 

(4.7) p s (x,w) = J p s (xo,w )f x ,w(x,w I x ,wo)dxodw . 

On the other hand, if we can find a function p s which satisfies (4.7) such that 

tt s == / p s is an invariant measure with ir s s = 1, then this p s is the unique 
invariant density satisfying ir s s = 1. We will show that 

(4.8) Ps(x,w) d = c~ 1 p S2 (w), c= s{y,w)p S2 (w)dydw 



satisfies (4.7), where the constant c is defined so that 7r s (s) = 1. The measure 
defined by (4.8) satisfies (4.7) iff p s =p' s , where 

(4.9) p' s (x,w) d = / / cp 2 (w )f x ,w( x ' w I xo,wo)dx dw . 
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From (4.9), we get 

p' s (x,w)= / cp S2 (w )\ I fx,w(x,w | x ,w )dx \dw 



cp S2 (w )<^J f e (x - x )f u (w - aw - b(x - x ))dx } dw 
cp S2 (^o)i / fe{i)fu{w-aw -b£)di\dwo 



cp S2 (wo)fw,w ( w I w o) dw o 
= cp S2 (w) 

= P S (X,W), 

where we have used the fact that the transition probability density function 
for {Wt} is given by 

fw,W (w | w ) = j f u (w - aw - be)f e (e) de. 

Since p si is constant, this means that p s (x, w) = c\p st (x)p S2 (w), where c\ is a 
constant, hence the two marginal processes are asymptotically independent. 

Let gw be any bounded, real, measurable function defined on ]R with 
compact support. By definition of the model, we have that Di is satisfied. 
Since {Xt} is a random walk with a smooth noise process possessing a finite 
third order moment, the random walk is /5-null recurrent. Since we have 
established asymptotic independence, condition D2(iv) becomes trivial and, 
likewise, condition D3. From (4.6), and since gw is assumed to be small, 
we infer that D4 holds. The last condition, D5, holds since the transition 
probability function is smooth. Thus, conditions D1-D5 are satisfied. 

Remark 4.1. The assumption on the {V^}-process can be relaxed in 
this example. It is sufficient that {Wt} is a stationary, nonlinear, autoregres- 
sive process. In (4.5), we may also replace the constant b with a measurable 
function tp, with E'0 4 (e) and sup^l^c^j finite. On the other hand, the calcu- 
lations made in the example are based on the linearity of the {A^j-process, 
with one interesting exception. Let {X t } be given by (4.5). Suppose that 

x' t = Hx t ), 

where <1> is a bijective measurable map between E\ and E[. Then the pro- 
cesses {X't} and {Wtjare asymptotically independent. 

Suppose that e' t = f bet + u t is bounded. Then \W t } is uniformly recurrent 
(cf. [24], Example 5.6, page 93) and we can use the fact that gw{w) = w. 
Imposing appropriate conditions, the uniform recurrence still holds in the 
nonlinear case. 
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The specialization in the next example makes the connection to linear 
cointegration even more explicit. 

Example 4.2. If the residuals in (4.5) are Gaussian, then we can cal- 
culate the conditional expectation for fixed t, and the rate at which we 
approach asymptotic independence and the fulfillment of D^iii) for Exam- 
ple 4.1. 

F(W I X ) E ( W ^ X 



and 



Hence, 



h d = f E(W t X t ) = bal 1 —^— = 0{l). 

1 — a 1 — a 



E{W t | X t ) = [0 t ] [t^Xt] = 0(1) a.s. 



by the strong law of large numbers. Likewise, it follows that the instanta- 
neous correlation between Xt and Wt decreases toward zero, 

corr(X t ,W t )=0(t- 1 / 2 ). 

However, {(X t ,W[)} is not Gaussian, where W[ = gw(Wt)- 

4.3. Asymptotic results. After clarifying the relationship between vari- 
ous 7r-measures in Lemma 4.1, the main result is stated in Theorem 4.1. We 
denote by tt Si the invariant measure for {Xt} implicitly defined by D2 and 
we write 7r^ for the X-marginal invariant measure of the compound chain 
defined by D3. 

Lemma 4.1. Assume that Di and D2 are satisfied. Then the compound 
process is (3-null recurrent and 

TT^IC*! = x • 



Proof. Let lcixB 2 be 7r s -integrable according to D2(iv). Let C2 C C\ 
such that C2 is a small set for the {X t } -chain and £ = lc 2 xE 2 £ £ + ■ Since 
v is a small measure and £ is 7r s -integrable, the conditions in the ratio limit 
theorem (cf. [24], page 130) are satisfied and we get 



J2t=0 vPts ^sS 



+ 0(l)=7r a £ + O(l). 
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By D2(i), we have that 

v x P{{dx) = F u (X t G dx, W t G E 2 ) = vP\dx x E 2 ) 



so that 



Then 



uP t i = u x P{l C2 . 



(4.10) ^° n VXP ll C2 = ^ + 0(1) = ^ + 0(1). 

X; t=0 VP 1 S TT S S 

On the other hand, since C 2 is small for the {A^j-chain, we have 

Combining these two asymptotic relations gives 

Since the left-hand side does not depend on the actual C 2 , it follows that 

TTsilc = Cq 1 vr s lc x£ ; 2 , C G £i 

for a fixed constant cq. The constant can be expressed as Co = 7r- Y si. The 
denominator of the fraction on the left-hand side of (4.10) has exactly the 
same asymptotic rate as the numerator. Hence, the compound chain is /3-null 
recurrent since {X{\ is /?-null recurrent (cf. KT). □ 

The following result is a modification of Theorem 3.5, which allows de- 
pendence between processes {Wt} and {X t } in (3.1). 

Theorem 4.1. Assume D1-D5. Moreover, assume that the kernel K 
satisfies B1-B2 and that for some e > 0, the bandwidth satisfies h~ l <C n@~ e . 
Then for all initial measures X, 



(4.11) 



h]l 2 sy\K x , hn ){j{ x ) - f(x) - M(9> j-^w - 



>M(0,a 2 gw (x)\\Kf 2 ), 



where a^ w {x) is given by (4.4). 

// the density pf and the function f possess continuous derivatives of sec- 
ond order, then the second bias term itf Ik x h i^x/k* K x ^ n is negligible when 

h~ l ^>n^^ +t . //pi 2,0 ' 1 = ^j%r exists and satisfies J lim y ^ x \p^'°\y,w)\ x 
\gw\(w) dw < 00, then the first bias term is negligible when h~ l 3> n@/ 5+e . 
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Proof. The proof of this result can be seen as a modification of the 
proof of Theorem 3.5. That proof was built on Theorem 3.1, which, in turn, 
was based on C1-C7. By Di and Lemma 4.1, the {(X t ,W t )} -process is 
/3-null recurrent. 

Let g h = K xA ® g w , g% = fi{g h )s, 9 h = K x>h ■ [tp x ], 0% = K x>h ■ [ip x - a h ], 
i>x = f~ f{x) and a h = Trfl Kxih i>x/^fK Xjh . Then 

O n \J^x,h) Jn[J^x,h) ^n{J^x,h) 

In this notation, the left-hand side of (4.11) equals 

( A 10 s sua (is Mi/2 f Sn(gh — 9h) i SjJfih) \ 

(4.12) {hS n (K Xih)} ( Sn(Kxh) + ^-^y}. 

As noted in the proof of Theorem 3.5, it is enough to prove that 

S-y\K x , h Jhy 2 S n (g hn -gl)^(Oy gw (x)\\K\ 

since the second term of (4.12) is Op(l). 
By D4(i)-(ii), \gh\ is a small function and 

H(9h) = ^s9h 

= TTs(K X h®gw) 



p*(x + hu)K(u)fj, w \ x [gw](x + hu) du 



(4.13) = 1 1 p s (x + hu,w)K(u)gw(w)dwdu 

1 



= o(i), 

where we have used the fact that D3 implies both pf and /xwix[<7w] are 
continuous at the point x and D4(iii), which ensures that the generalized 
conditional expectation is zero at x. 
We also find that 

K\9h\)=pf (x)l*w\x[\9w\](x) + 0(l). 
Let g' h = 9h-9h- Then since K9h) = °> 

(4.14) a 2 (g' h ) = ir s g' h 2 + 2h- 1 A*(g' h ,hg' h ), A*(g h , f h ) <M TT s I 9h PG s , v f h , 
using (A.12). By (4.13) and (4.14), 

h^ s g' 2 + 2A k (g' h , hg' h ) = Jm s g 2 h + 2A*( 5h , hg h ) + 0(1). 
Hence, an asymptotic variance, a 2 = f lining ha 2 (g' h ), if it exists, is given by 



(4.15) a 2 = lim{hir s g 2 h + 2A*(g h ,hg h )}. 

hiO 
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In order to verify (4.15), we begin by showing that the first term on the 
right-hand side of (4.15) satisfies 

h7T s (g 2 h ) = \\K\\ 2 2 a 2 gw (x)pf(x) + 0(l), 

where the conditional variance is given by (4.4). 
Indeed, by the definition of gh, we find that 

Ps (y, w)Kl h (y)g 2 v (w) dy dw 
= h^J Ps ( X + h uMK\u)gl v Hdud W 

= h~ 1 \\K\\l J p s (x,w)g 2 v (w)dw + 0(l) 

= h- 1 pf(x)[\\K\\ 2 2 a 2 gw (x)+0(l)]. 

The next task is to show that A*(gh,hgh) is asymptotically negligible. 
Let 

fh(y,w) d = [hK xA (y) - l {x} (y)K(0)]g w (w) 
= ^{xy • hK x , h ® g w ]{y,w) 

so that 

hgn = fh + K (fy[ l {x} ® 9w] =fh + Sh, 

say. 

By D3(i), we find that vr s 1 | =0 and thus 7V s PG s>u Sh = 0. Hence, 
A*(g h ,hg h ) = A*(g h J h ) 

p s (x + hu,w)K{u)gw{w)PG s ^fh{x + hu,w) du dw, 

where we have inserted the invariant density and made a standard substi- 
tution. 
Let 

f]h = ||ooGs,i,{l{x}c • Mc{h)} (8> \gw\ 

so that 

\&*(9h,fh)\ 
(4.16) ,, 

< // p s (x + hu, w)K{v)\gw {w)\Pr]h(x + hu, w) dudw. 

By D4(i)-(ii) and Nummelin ([24], Proposition 5.13, page 80), the function 
rjh is bounded. Since {l{ x }c • M x (h)} (g) \gw\ I pointwise, lim^o %(2/> w ) I 0. 
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Let e > and = f {rj^ > e}. Then {^4^} satisfies D5. Inserting 77^ = 
lAjJih + lA h Vh into (4.16), we get 

\A*(gh,fh)\ < e J J p s (x + hu,w)K(u)\g w (w)\dudw 
+ ||%||oo / / Ps(x + hu,w)K(u)\g w (w)\ 



x P((x + hu, w),Ah) dudw. 
The main part of the last term of the above expression is bounded by 



(4.17) p s (x + hu,w)K(u)< sup P((y,w),A h )\g w (w)\dw>du 

{\x-y\<e h J > 

for all = /isup{|u| : u G A/"o} and lim^o e h I 0. 

Using D4, it follows that (4.17) is 0(1) with respect to h. Putting all of 
this together, it is clear that 

hmh^|A*(^,,/ h ,)|=0. 

ej.0 hlO 

Thus, we have so far proved that 

ha 2 (g h ) = hn s (g 2 h ) + 0(l) = \\K\\ 2 a 2 w {x)p^ {x) + 0{1). 

We must also check ha 2 (\g' h \). This quantity is given by 

ha\\g' h \) = h7T s \g' h \ 2 + 2A*(\g h \,h\g' h \) - hir 2 s \g' h \ - 2hn s \g' h \ tt s (s ■ \g' h \), 

by (A.12) of Appendix A. Since 7r s |^| < ^(^1 + |/x g J = ir s \g h \ + o(l) and 
■^sbhl = C(l), we have 

ha 2 (\g' h \) = h7T s \g' h \ 2 + 2A*{\g h \, h\g h \) + 0(1). 

By the same arguments as those given above, we find that 

ha 2 (\g' h \) = h-K s \g' h \ 2 + 0{l) 

= tnr s \g h \ 2 + 0(1). 

Since gt is small, we easily find that (cf. Theorem A.l in Appendix A) 

E\\U(g h ) - K9h)\\ 2m < d m h- 2m+ \ m > 1 

and 

n\U(\9h\ - K\9h\)\\ 2m < d' m h- 2m +\ m > 1. 

Moreover, we have h\g h \ < g , g d = cq\ Mx and Pa(^o(5o) < 00) = 1. 

Thus, the assumptions in Theorem 3.1 are satisfied and (4.11) holds. It is 
straightforward to verify that the bias terms are negligible under the given 
conditions (cf. KT). □ 
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Remark 4.2. If / p s (y,w)gw{w) dw = 0, then /j,(g h ) = 0, by D 3 (iii). If 
this assumption holds, then the stochastic bias correcting term in (4.11) is 
zero. If D4(iii) is strengthened so as to also require asymptotic independence, 
then a w (x) = = Eg^(W t ). 

5. Simulations and finite sample behavior. Estimates similar to that in 
(1.1) have appeared in the cointegration literature. Our contribution, which 
we believe to be new, is that we have singled out classes of processes and 
assumptions for which an asymptotic theory of these estimates can be con- 
structed, such that it should be possible to work out confidence intervals and 
bands (and possibly rigorous tests of nonlinear cointegration, in the sense 
discussed in this paper). 

The purpose of this section is to illustrate the small-sample properties of 
the estimator f(x) defined by (1.1), using simulations, 

A problem not encountered in the stationary case is that the simulated 
realizations may cover very different x-regions. Hence, for a fixed x = x' , 
close to the starting value Xq = 0, say, of each realization, some realizations 
may have many observations in the neighborhood of x' , whereas other re- 
alizations may have none in the vicinity of x' for the sample size we are 
considering. This kind of behavior does not occur in the stationary case, 
where the expected time until the process reaches x' is always finite and, 
in practice, small when \x'\ is small. This means that in a finite-sample ap- 
proximation of the asymptotics, we can either keep x fixed and wait until 
we have sufficiently many observations close to x or we can choose a central 
realization-dependent value of x (e.g., the modal value of the sample) for 
studying the normalized ratio (3.2) of Theorems 3.5 and 4.1. We have chosen 
to adopt both procedures, although, clearly, we introduce some extraneous 
stochastics into the problem in the latter case. 

A difficult and largely unresolved problem is that of choosing a proper 
bandwidth. Theorem 3.5 and Theorem 4.4 of KT only give the allowable 
rate as n tends to infinity. It should be noted that these rates are different 
from those in the stationary case, n effectively being replaced by n 13 . In 
practice, we have found it useful to use cross-validation and to let the band- 
width h depend on x. In fact, we have typically let h n be proportional to 
{Tc{n)pc{x)}~ 1 /^ , where pc( x ) could be thought of as the locally estimated 
density and where it is known from KT (Lemma 3.4) that Tc(n) essentially 
behaves as nP. 

The approximation to normality as a function of sample size, for the 
quantity 
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(a) (b) 

Fig. 1. (a) Thick line: The standard normal pdf. Thin lines: The estimated pdfs for 
the quantity (h n J^Kx^ / J K 2 {u) du) 1 ^ 2 [f(x) — f(x)], at the point x = 7 .5 , derived from 
the cointegrated system (Xt,Zt\t > 1), where X t = Xt-i + e t , Z t = f(Xt) + £t, e< and 
St are independent i.i.d. A/"(0, 1) variables and f(x) =x for all real x. The quantity is 
estimated by 1000 realizations and a particular realization is admitted into the evaluation 
as, respectively, 100, 200, 300, 500 and 800 observations are accumulated in the interval 
(5,10). (b) Thick line: The standard normal pdf. Thin lines: The estimated pdfs for the 
same quantity as in (a), but where a particular realization is admitted into the evaluation 
at the modal value. The length of the time series is 500, 1000 and 3000, respectively. 

derived from the simple cointegrated system 

X t = X t -i + e t , Z t = X t + W t , e t and W t independent ~ jV(0, 1) 

at the point x = 7.5, is shown in Figure 1(a). 1000 realizations have been used 
and a particular realization is admitted into the evaluation as, respectively, 
100, 200, 300, 500 and 800 observations are accumulated in the interval 
(5,10). For Figure 1(b), on the other hand, a fixed point x has not been 
used; rather, x has been taken to be the modal value and is thus varying 
from one realization to another. In this case, the length of the time series is 
500, 1000 and 3000, respectively. 

In Figures 2(a) and 2(b), we have considered (5.1) for the system 

, X t = X t -i + e t , Z t = X t + W t , W t = V0Set + V0Xe t , 
(5.2) 

St and et independent ~jV(0, 1) 

to test the asymptotics in the case of dependence between {X t } and {W^}, 
as described in Section 4. As in Figure 1(b), x is taken to be the modal 
value in Figure 2(b). For both Figure 1 and Figure 2, it is seen that the 
finite sample distribution gets reasonably close to the asymptotic normal 
distribution. 
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(a) (b) 

Fig. 2. (a) Thick line: The true transfer function f(x) = x — 5. Dots are zt plot- 
ted against x t , t > 1. We have 500 observations. Thin line: Estimated transfer func- 
tion f, built on 500 observations from the cointegrated system (Xt,Z t ;t > 1), where 
X t = X t -i + e t and Z t = /(X t ) + W t , w/iere Wt = ^05e t + ^05e t for t > 1. (e t ,e t ) 
are i.i.d. A/"(0, 7) vectors for t>l and I is the identity matrix. Finally, f(x) — x — 5 /or 
a; real, (b) Thick line: T/ie standard normal pdf. Thin lines: The estimated pdfs for the 
quantity {h n '^2 l K x ,h n / J K 2 {u)du) 1 ^ 2 [f{x) — f(x)], derived from the cointegrated system 
(X t ,Z t ;t>l), where X t - X t -i + e t and Z t = f(X t ) + Wt, where Wt = VflEet + Volet 
for t > 1. (et,Et) are i.i.d. A/"(0,7) vectors for t>l and I is the identity matrix. Finally, 
f(x) = x — 5 for x real. The quantity is estimated by 1000 realizations and a particular 
realization is admitted into the evaluation at the modal value. The length of the time series 
is 500, 1000 and 3000, respectively. 



Note that {X t } and {W^jm (5.2) are asymptotically independent with 
Vw\x[9w](x) = and cr gw (x) = 1. Actually, in (5.2), gw(W t ) = W t and the 
assumptions D^i) and D^ii) are not satisfied, this being something we 
wanted to test by means of this simulation experiment. On the other hand, 
with the exception of the independence assumption, the other assumptions 
in Theorem 3.5 are satisfied. We also carried out an experiment with 

In this case, {(X t ,Wt)} is not Markov. On the other hand, {(Xt, Wt, et)} 
is Markov and {Xt} is asymptotically independent of the Markov process 
{(Wt,et)}. The distributional results were similar to those of Figure 2(b). 
More simulation experiments and a real data example are given in [20]. 



6. Some final remarks on nonlinear cointegration. This paper can be 
looked at in two ways: (i) it is an attempt to establish a statistical theory for 
nonparametric regression with a nonstationary regressor and (ii) in addition, 
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it is seeking to relate this framework to the problem of nonlinear cointegra- 
tion. There are a host of open problems for both. For (ii), it is of particular 
interest to weaken conditions D^ii) and D^iii) on gw, alternatively, letting 
W' t = gw{Xt, . . . , X t -p, Wt), as indicated in the second paragraph of Section 
3. But, there are also conceptual issues involved concerning the function /. 
In a nonparametric approach like ours, / is determined by the data and 
if {Zt} and {Xt} are close to being linearly cointegrated, one expects the 
nonparametric estimate / to be close to a linear function and might think 
that the difference between / and a linear function could be used to test for 
linearity of the cointegration. One could also test for appropriate parametric 
functions for /. For the estimation of nonlinear parametric regression func- 
tions using local time arguments, see [25]. Clearly, not every parametric / 
makes sense from a cointegration framework. As an extreme case, consider 
f(x) = constant. Then {Zt} will be stationary and unrelated to {Xt}. In 
a cointegration framework, {Zt} should be nonstationary and the question 
arises as to whether it is possible to construct nontrivial functions / such 
that {Zt} is stationary, even though {Xt } is nonstationary. Another question 
is whether all such functions / (e.g., the sine function) will be economically 
meaningful. 

One of the referees has pointed out that the function / may include a 
constant term. But, a deterministic term depending on the time parameter 
(e.g., a linear trend) is not included in the model. An extension of the model 
in this direction introduces challenging problems concerning properties of 
estimates for both / and the trend. It seems to be quite clear that additional 
assumptions on {Xt} are required since null recurrence itself is not related 
to the growth rate of a linear trend. The situation is much more specific in 
the random walk case, where the variance of {Xt} increases linearly and / 
is linear. 

Still another issue is whether / should be required to be one-to-one for it 
to be meaningful in a cointegration framework. Requiring / to be one-to-one 
has the advantage of allowing the possibility of expressing {Xt} in terms of 
{Zt}, making for a more symmetric relationship. To estimate such an inverse 
relationship would be nontrivial since it would require an extension of the 
theory to the case where the regressor is a function of a Markov chain. 

In the linear cointegration case, the concept of cointegration is intimately 
connected with the so-called error correction representation (cf. [18]). Non- 
linear extensions have centered on both nonlinear error correction and non- 
linear cointegration (see, e.g., [7, 8, 13, 17]). It remains to explore possible 
connections between these models and the approach presented in this paper. 

Nonlinear cointegration extensions are more demanding and are at the 
core of the present paper. Only a few attempts of such an extension can 
be found in the literature. Specific nonlinear cointegration relationships in 
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terms of threshold models have been studied by Hansen and Seo [16] and 
Bee and Rahbek [3]. Escribano and Mira [8] suggest definitions of 7(0) and 
7(1) which are useful in a parametric nonlinear context and study several 
large- and small-sample properties of nonlinear least squares estimation. 
Related work in nonlinear parametric regression theory has appeared in Park 
and Phillips [25]. Nonparametric estimates of nonlinear cointegration have 
been computed from data by Granger and Hallman [12] and Aparicio and 
Escribano [1]. However, no attempt has been made to study the asymptotic 
properties of nonparametric estimators either for nonlinear error correction 
or cointegration models. 

APPENDIX A 

In this appendix, we assume that {X t } is an aperiodic, (^-irreducible 
Markov chain with state space (E,£), where £ is countably generated. 
We also assume that the transition probability P satisfies the minoriza- 
tion inequality, (2.2), that is, P > s and that {X t } is Harris recurrent. 
Recall the taboo transition probability 77 = P — s (g> u, the taboo kernel 
G s ,u = Sj^=o ' the index set A™ = {a G J\f r + : a j = m } ; the multino- 
mial coefficient ( m ) = — ^ — r and the moment function 
\OL J ai!-ar! 

(A.l) VV, Q = E Hni- 1 ...H»I g ~r 1. 

The r-Cartesian product of the set of integers where all but the first coor- 
dinate are strictly positive is denoted by Mq + . 

A.l. Higher-order moments. An expression for the moments of a U- 
block is derived from a moment formula for a real sequence (cf. [21]): 

Lemma A.l. Let {at} be a real sequence and m > 1 an integer. Then 

{n \ m m / s. 

E«4 =E E VI) E i^nx-.ee, 

where s e =j 1 -\ \-je, £=l,...,r. 

Theorem A.l. Let g = {g t } be a sequence of real-valued measurable 
functions defined on E. Let U$ = Uo(g) = f J2k=o9k(Xk)- Then 

W=E E uK^- 

r=la£A!» ^ 7 
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Remark A.l. li g k = g, then 

i>r,Cl = E H^Ig^l ■ ■ ■ W r + l I g0!r 1 = [G Sji ,Ig«l H] ■ ■ • [Gs^ig^r-l ff] G s ^I g a r 1. 



Proof of Theorem A.l. The main ingredient in this proof is the 
lemma formulated above, together with the Markov property. 

Let B s = Tf V^ s y _ l5 A s = (1 - Y 8 ), B , s = {r > s} = UtZl A k and B tit+h = 
nltt" 1 A fc . From Lemma A.l, the definition of A™ and a k = g(X k )l(r > k) 
with n = oo, we get 



with 



(A.2) Z ha ^ (X S1 ) • • • <% (X Sr )B 0>Sr 



Let r and a be fixed and fk(%) *== (3;) for k = 1, . . . ,r. Then 

J r = Jr,a= E Zj, Zj = f Sl (X Sl ) ■ • ■ f Sr (X 3r )Bo jSr - 

It is enough to prove that 

(A3) E x J r = £ ••• //? //, 

for arbitrary r and {/&}• We will prove this by induction on r. When r = 1, 

00 00 

■A = E fh( x ii) B oji = E /ii(*;i)iO->ii) 

ji=0 ji=0 

and 

00 00 
ExJi = ^fn{X h )l{r>h) = H h I f]1 l(x), 

31=0 ji=0 

which shows that (A. 3) is true for r = 1. 

Assume that (A. 3) is true for r — 1. Corresponding to the induction hy- 
pothesis, let J= (Ji, . . . ,j r -i) and s = s r _i. Then 



(A. 4) Zj = f si (X Sl )■■■ f Sr (X Sr )B 0;Sr = ZjAsfs+jr ( X s+jr)Bs+l, 



S+jr ) 
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where ZyAs is measurable Bs+i ■ Taking conditional expectation with respect 
to Bs+i in the last part of (A. 4) gives 

^{fs+j r {Xs+j r )Bs + l,s+jr I ^ S + l) = ^{f S+j r { X S+j r ) B s + 1 ,S+j r I ^S + l} 

= ^X s + 1 {fs+jr( X ir-l) B Q,jr-l} 



so that 



(A.5) 



bl(X s _ +1 ) d ^ J2 nfs+ jr (Xs +jr )Bs + i,s +jr I Bs+i} 

j r =\ 

[jr = l ) 



Combining (A.4)-(A.5), we obtain 



(A.6) 



E x J r = E x Y ZjAsfJiXs+i) 

= E X Y Zf,[As<l>l(Xs+i)\Bs 

=e x Y z?f>s(Xs), 



where 



(X a ) d ^ f E{A a </>° s (X a+1 ) | Bs) = E x M{Xi){l - Y )} = H<f>° s (X s ). 



The conditional step above reduces the dimension of j and it remains to 
verify that (A. 3) is correct when we apply the induction hypothesis. We 
look at f s (x) = defined in (A.4). Let /°(x) = f s (x)H(j) s (x). Then 



(A.7) 



I fS l(x)=fl(x) = I f£ 



L>=1 



(x). 



By (A.6), the product (A. 2) is reduced from r to r — 1 since, using the fact 
that s = s r -i, we have 

(A.8) E x J r = E x Y fsAX sl )---f Sr _ 2 (X Sr _ 2 )fl_ 1 (X Sr _ 1 ). 

Hence, by (A.8), we can evaluate the expectation of J r by the induction 
hypothesis, which, together with (A.7), gives (A. 3). □ 



NONLINEAR COINTEGRATION 
Corollary A. 1. Let Uo(a,g) = J2l=o a k9(Xk)- Then 



4.3 



(A.9) 



r=la£Aj 



}• 



where dj >a = H n I g <* 1 ■ ■ ■ H 3r I g ot r l. In particular, for m = l,2, we have that 

oo 



^xU (a,g) = ^dj(x)aj, dj = H j I g l, 

3=0 

(A. 10) °° oo oo 



d j:l = Wl g H%l. 



j=0£=l 



Proof. We obtain (A.9) from (A.l) with gj = ajg. With m = 2, we have 
%cUo(a,g)= E ( a ) ^oW + E ( a )^.«( a 



^1,2^) +2^2,(1,1) ( X ) 



E #H 



ii=o 



l(x) + 2 



E ^ 



J'i=0 



E 



9J1+J2 



J2=l 



l(x) 



j=0 j=0s=l 



Hence, 



E„£# (& </) = E + 2 E E a jaj+s {uWl g H s g}. 

3=0 j=0 s=l 

Remark A. 2. In particular, if aj = 1, we write 

T 

U = U (g) = J29(Xk) 

k=0 

and (A. 10) gives the formulas K u Uo(g) =7r s g and 
E^o (<?) = 7r s g 2 + 2ir s I g HG StU g 

(A. 11) 

= TT s g 2 + 2TT s I g PG s ^g - 2-K s I s g-K s g. 



□ 
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Let (x(g)=M u U (g) and a 2 (g) = Vax(U(g)). Then 

(A. 12) fi(g) = -K s g, a 2 (g) = -K s g 2 - -K 2 s g + 2-K s I g PG s>v g - 2ir s I s gTT s g. 
A. 2. Moment inequality. 

Lemma A. 2. Assume that (2.2) holds. Let p > 1 and rj G (0, 1) and let f 
be a real-valued measurable function defined on E. Then for any probability 
measure X, 

AG^I/l^caEf^+^-^jsupA^'l/r, 

i>o 

t =p/(i + v(p- !))) q=p/v(p-i), 

where C2 is a universal constant dependent only on p and r\. 

Proof. Let q' =p/(p— 1), r = q' /(l — rj), w = 1/q', v = 1/q' and u = 
2/q' . Then u = v + w, p" 1 + q~ l + = 1, 1/t = 1/p + pu = 2(p — 1) 
and qv = r]^ 1 . 

From the right-hand side of (2.10) and by the Holder inequality, we obtain 



Gi u \f\(x) 



< 



< 



EMi(T>i)|/|(^)} 

3 

E^ /P (^>i)^ /9 {I/I 9 (^)} 

3 

^l /p (r>j){P j \f\ q (x)}^ 



L 3 



Y.^l /p ^>Mr v {p ] \f\ q {x)}^){r w ) 

3 

/ \ t/p / \ t/q i 

< \Y,n(r>j)j (Y,r vq p j \f\ q (x)j (£ 



t/r 



J 



= c l V t/p Z t / q , 

say. We apply the Holder inequality again with p\ = p/t and gi = q/t. This 
gives 



AG^J/I <c 1 \V t/p Z t / q 

< Cl [A* /p y][A* /9 Z] 



and 



ci 
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lt/pr oo 



<C 2 



E/^(r>j) 
lj=0 



3=0 



3=0 

t/i 



lj=0 



t/q 



t/P 



t/p 



E-/ " xrJ f 

3=0 

sup\ t/q P j \f\ q 

j>0 



t/q 



3>0 



Co, 



Er wr 

3=0 



t/r 



j=0 



t/q 
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APPENDIX B 

Proof of Lemma 3.2. Let Hj = P 3 - sj Uj for j = 1, 2. We begin by 
showing that 



(B.l) 

where 



{W T i}±{W.} when A = A, 



(B.2) A = A 2 $ Al , $ Al =^{AiFf Sl }P|, P=P 2 ^. 

£=0 

In order to prove (B.l), it is enough to show that for all integers r > and 
for all Ai G £~2~, 

(B.3) P A (W T i eio,-, W r i e A) = e4-^ r e AO- 

Let fco = Jo and fc^ = jo + ji + h for £ = 0, . . . , r. We have 

P A (W T ie4r--W r T ieA0 

oo oo oo 

= E E •••E P A 2 (^c,e A---^ fcr GA.)ff ) A 1 (r 1 =io,...,T r 1 =>) 

i0=0jl = l jr = l 

OO OO oo 

= E E ■■■Y.{^ p 2 I A ---pi T iA r i}{{\iH^ Sl }b n ...b 3r } 

jo=Oji=l >=1 



XI Ao PI Al ---PI Ar l, 
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where b e = viH^si, £>0. Hence, (B.3) holds. 
From 

P > (s 2 ® v 2 )<5> Ul = s 2 ® U 2 ^u 1 , 

we obtain the minorization inequality (3.12). Let H = P — s (8> v. Then 

H = P 2 $ Vl -s 2 ® v 2 <& Vl = (P2 ~s 2 ® v 2 )^ Vl = H 2 ^ V1 , Q= Q 2 <f> Ul , 

where Q 2 (in terms of H 2 , s 2 ) and Q (in terms of H, s) are defined by (2.3). 
The next task is to prove (3.13), that is, 

{ W T i } = { W k } when A = A, 

where denotes the split chain generated by P and ( s, v). Let P denote 

the transition probability function for this split chain and let P ' denote the 
transition probability for {VF r i}. We must prove that 

k 

First, we recall the structure of a split chain. Suppose that P is a transition 
probability which satisfies P > s®u. Then the corresponding split chain has 
transition probability P, which satisfies, for n > 1, 

P n (x x y ,dx xy)= y uP n ^ 1 (dx){ys(x) + (1 - y)(l - s(x))} 

+ (1 - y )QP n ^(x , dx){ys(x) + (1 - y)(l - s(x))}. 

In our case, this gives, for n = 1, 

P(w x y ,dw x y) = y u(dw){ys(w) + (1 - y)(l - s (w))} 
(B.4) +(l-y )Q(w ,dw) 

x {ys(w) + (l-y)(l-s(w))}. 

We more carefully consider P' , which by (B.2) satisfies 

00 

1=1 

We replace P 2 by the right-hand side of the expression 

H{wq x Vo, dwxy) = y v 2 P$- 1 (dw){ys 2 (w) + (1 - y)(l - s 2 (w))} 

+ (l-yo)Q2Pt 1 (^o,dw) 

x {ys 2 (w) + (1 - y)(l - s 2 (w))}, 
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where 

oo oo 

(B.5) b^Pt 1 = "2**1 =\L, E hQiPt 1 = Q2*vi = 9 

e=i 1=1 

We then obtain (3.13) from (B.4)-(B.5). The first equality in (3.14) fol- 
lows from (3.13) and the second is the occupation formula given by (2.10). 
Actually, when A = Ai x 7T2, we get 

00 

A = vr 2 $ Al = ^{Aii?fsi}vr 2 P^ = 7r 2 {AiG Sl)I/1 si} = vr 2 . 

1=0 

Finally, if A = v = v\ x v-i , then A = v and v G =t^2 since tt = ir S2 . □ 



Proof of Lemma 3.3. The waiting times {Sj, j > 0} are given by 

-1-7-1 

3 3- 



5j = Tj — tJ_ 1 . Let b n ^ = P{5\ + ■ ■ ■ + 5 n = k) for k > n and bi^ = b^. Then 



-Vi.fc 



V\H\ for n = 1 and k > 1, 



b™, for n > 1 and k > 1, 

where denotes n-times convolution. The n-step transition probability 
P is given by 

oo 

(B.6) P n =HKn+3 P Z +j , n>l. 

3=0 

Since {W{\ is geometric ergodic ([24], Theorem 6.14, page 120), there exist 
a nonnegative function M such that ^(M) < oo and a constant p G (0, 1) 
such that 

\\P2(x, •) - 7r 2 || < M(x)p n , x£E, n > 0. 

Thus, by (B.6), 

oo 

\\P n ( X >-) -7T|| <J2K,n+j\\P? + \x,-) -TT\\ 
j=0 
oo 

<y t K,n+ j M(x)p n+ ^ 

(B.7) j=0 

oo 
j=0 

<M(x)p n . 

Hence, by (B.7), is geometric ergodic. 
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For the ergodic {Wt}> we have 



a t = sup 6 e (A,B), Q 1 (A,B)=k 2 1 a P? 1 I b \-k 2 \ A 'k 2 \b. 

A,BgE 



Here, 



9 e (A,B) = kiIaP^b! - ^tXa^b = S £J ) ^A 1aP 1 1b ~ ^a^b} 

oo 

That is, 

oo oo 

(B.8) q t < Y] btj sup 0j(A, B) = Y] hjCtj < ag. 

j=l WE j=£ 

By [5], in general, 

oo 

^£ k a e <oo => E n2 TQ +1 < oo. 
e=i 

By (B.8) it follows that 

oo oo 

^2i h ai <oo => ^2i k q £ <oo. 
t=\ t=\ 

Hence, (3.15) is true. □ 

Remark B.l. We see that a, < E[a(<5i H Y&i)\- A sharper inequality 

would be q e < atp/p + 0(1), and if this inequality is correct, then 

oo oo 
i=\ t=\ 
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